Information Gathering and Reward Exploitation of Subgoals for POMDPs
نویسندگان
چکیده
Planning in large partially observable Markov decision processes (POMDPs) is challenging especially when a long planning horizon is required. A few recent algorithms successfully tackle this case but at the expense of a weaker information-gathering capacity. In this paper, we propose Information Gathering and Reward Exploitation of Subgoals (IGRES), a randomized POMDP planning algorithm that leverages information in the state space to automatically generate “macro-actions” to tackle tasks with long planning horizons, while locally exploring the belief space to allow effective information gathering. Experimental results show that IGRES is an effective multi-purpose POMDP solver, providing state-of-the-art performance for both long horizon planning tasks and information-gathering tasks on benchmark domains. Additional experiments with an ecological adaptive management problem indicate that IGRES is a promising tool for POMDP planning in real-world settings.
منابع مشابه
Myopic Policy Bounds for Information Acquisition POMDPs
This paper addresses the problem of optimal control of robotic sensing systems aimed at autonomous information gathering in scenarios such as environmental monitoring, search and rescue, and surveillance and reconnaissance. The information gathering problem is formulated as a partially observable Markov decision process (POMDP) with a reward function that captures uncertainty reduction. Unlike ...
متن کاملOnline Planning in Continuous POMDPs with Open-Loop Information-Gathering Plans
This paper studies the convergence properties of a receding-horizon information-gathering strategy used in the recently presented RBSR planner for continuous POMDPs. The planner uses a combination of randomized exploration, particle filtering, and goal-seeking heuristic policies to achieve scalability to high-dimensional continuous spaces. We show that convergence is ensured in a subclass of pr...
متن کاملReinforcement Learning in POMDPs Without Resets
We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to “resets” or “offline” simulation. We provide algorithms for general POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depends exponen...
متن کاملReinforcement Learning in POMDPs Without Resets
We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to “resets” or “offline” simulation. We provide algorithms for general connected POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depen...
متن کامل